Syndromic Classification of Twitter Messages
نویسندگان
چکیده
Recent studies have shown strong correlation between social networking data and national influenza rates. We expanded upon this success to develop an automated text mining system that classifies Twitter messages in real time into six syndromic categories based on key terms from a public health ontology. 10-fold cross validation tests were used to compare Naive Bayes (NB) and Support Vector Machine (SVM) models on a corpus of 7431 Twitter messages. SVM performed better than NB on 4 out of 6 syndromes. The best performing classifiers showed moderately strong F1 scores: respiratory = 86.2 (NB); gastrointestinal = 85.4 (SVM polynomial kernel degree 2); neurological = 88.6 (SVM polynomial kernel degree 1); rash = 86.0 (SVM polynomial kernel degree 1); constitutional = 89.3 (SVM polynomial kernel degree 1); hemorrhagic = 89.9 (NB). The resulting classifiers were deployed together with an EARS C2 aberration detection algorithm in an experimental online system.
منابع مشابه
A High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملTwitter mining for fine-grained syndromic surveillance
BACKGROUND Digital traces left on the Internet by web users, if properly aggregated and analyzed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients' terminology is essentia...
متن کاملYou Are What You Tweet: Analyzing Twitter for Public Health
Analyzing user messages in social media can measure different population characteristics, including public health measures. For example, recent work has correlated Twitter messages with influenza rates in the United States; but this has largely been the extent of mining Twitter for public health. In this work, we consider a broader range of public health applications for Twitter. We apply the r...
متن کاملSemantic Analysis of Open Source Data for Syndromic Surveillance
Objective The objective of this analysis is to leverage recent advances in natural language processing (NLP) to develop new methods and system capabilities for processing social media (Twitter messages) for situational awareness (SA), syndromic surveillance (SS), and event-based surveillance (EBS). Specifically, we evaluated the use of human-in-the-loop semantic analysis to assist public health...
متن کاملArtificial Prediction Markets as a tool for Syndromic Surveillance
A range of data sources across the internet, such as google search terms, twitter topics and Facebook messages, amongst others, can be viewed as kinds of sensors from which information might be extractable about trends in the expression of matters of concern to people. We focus on the problem of how to identify emerging trends after the original textual data has been processed into a quantitati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011